Reference Line Extraction from Form Documents with Complicated Backgrounds

نویسندگان

Dihua Xi

Seong-Whan Lee

چکیده

Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference lines which are contained in almost all form documents. This paper presents an efficient methodology for the complicated grey-level form image processing. We construct a non-orthogonal wavelet with adjustable rectangle supports and offer algorithms for the extraction of the reference lines based on the strip growth method using the multiresolution wavelet sub images. We have compared this system with the popular Hough transform (HT) based and the novel orthogonal wavelet based methods. As shown in the experiments, the proposed algorithm demonstrates high performance and fast speed for the complicated form images. This system is also effective for the form images with slight skew.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Microsoft Word - john_icita_ell.rtf

-In this paper, we present a fast and robust ellipse extraction method. The proposed method can extract ellipses with high accuracy and speed from images with complicated backgrounds. It consists of two parts. First, we extract arc segments from an ellipse approximated by short straight lines that are extracted by a fast line extraction algorithm. Second, the arc segments are used to calculate ...

متن کامل

Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processi...

متن کامل

Stroke-model-based character extraction from gray-level document images

Global gray-level thresholding techniques such as Otsu's method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A str...

متن کامل

Automatic Detection of Font Size Straight from Run Length Compressed Text Documents

Automatic detection of font size finds many applications in the area of intelligent OCRing and document image analysis, which has been traditionally practised over uncompressed documents, although in real life the documents exist in compressed form for efficient storage and transmission. It would be novel and intelligent if the task of font size detection could be carried out directly from the ...

متن کامل

An Efficient Recognition and Data Extraction Method for Table-Form Documents

In Asia, many documents processed in offices are table-form documents. Hence the automatic processing of table-form documents is an important issue of the office automation research. In this paper, we propose an efficient representation method for table-form documents. The representation method is based on three types of line segments. The line segments are normalized and sorted, hence the repr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Reference Line Extraction from Form Documents with Complicated Backgrounds

نویسندگان

چکیده

منابع مشابه

Microsoft Word - john_icita_ell.rtf

Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

Stroke-model-based character extraction from gray-level document images

Automatic Detection of Font Size Straight from Run Length Compressed Text Documents

An Efficient Recognition and Data Extraction Method for Table-Form Documents

عنوان ژورنال:

اشتراک گذاری